15th June 2022
Image source: https://web.genewiz.com/single-cell-faq
Aim of QC
Above is achieved by …
Bioconductor R packages:
Orchestrating Single-Cell Analysis with Bioconductor Robert Amezquita, Aaron Lun, Stephanie Hicks, Raphael Gottardo
CellRanger outputs: gives two output folders raw and filtered
Each folder has three zipped files
To access counts from sce object: counts(sce)
To access gene metadata from sce object: rowData(sce)
To access cell metadata from sce object: colData(sce)
Although the count matrix has 36601 genes, many of these will not have been detected in any droplet. We can remove these to reduce the size of the count matrix.
undetected_genes <- rowSums(counts(sce)) == 0 sce <- sce[!undetected_genes,] sce
AnnotationHubrowData(sce)
is.mito <- which(rowData(sce)$Chromosome=="MT") sce <- addPerCellQC(sce, subsets = list(Mito = is.mito))
Adds six columns to the droplet annotation:
is.mito <- which(rowData(sce)$Chromosome=="MT") sce <- addPerCellQC(sce, subsets = list(Mito = is.mito))
colData(sce)
## DataFrame with 3094 rows and 8 columns ## Sample Barcode sum detected subsets_Mito_sum #### AAACCTGAGACTTTCG-1 SRR9264343 AAACCTGAGACTTTCG-1 6677 2056 292 ## AAACCTGGTCTTCAAG-1 SRR9264343 AAACCTGGTCTTCAAG-1 12064 3177 575 ## AAACCTGGTGCAACTT-1 SRR9264343 AAACCTGGTGCAACTT-1 843 363 428 ## AAACCTGGTGTTGAGG-1 SRR9264343 AAACCTGGTGTTGAGG-1 8175 2570 429 ## AAACCTGTCCCAAGTA-1 SRR9264343 AAACCTGTCCCAAGTA-1 8638 2389 526 ## ... ... ... ... ... ... ## TTTGGTTTCTTTAGGG-1 SRR9264343 TTTGGTTTCTTTAGGG-1 3489 1600 239 ## TTTGTCAAGAAACGAG-1 SRR9264343 TTTGTCAAGAAACGAG-1 7809 2415 548 ## TTTGTCAAGGACGAAA-1 SRR9264343 TTTGTCAAGGACGAAA-1 9486 2589 503 ## TTTGTCACAGGCTCAC-1 SRR9264343 TTTGTCACAGGCTCAC-1 1182 591 224 ## TTTGTCAGTTCGGCAC-1 SRR9264343 TTTGTCAGTTCGGCAC-1 10514 2831 484 ## subsets_Mito_detected subsets_Mito_percent total ## ## AAACCTGAGACTTTCG-1 12 4.37322 6677 ## AAACCTGGTCTTCAAG-1 12 4.76625 12064 ## AAACCTGGTGCAACTT-1 11 50.77106 843 ## AAACCTGGTGTTGAGG-1 12 5.24771 8175 ## AAACCTGTCCCAAGTA-1 13 6.08937 8638 ## ... ... ... ... ## TTTGGTTTCTTTAGGG-1 11 6.85010 3489 ## TTTGTCAAGAAACGAG-1 12 7.01754 7809 ## TTTGTCAAGGACGAAA-1 12 5.30255 9486 ## TTTGTCACAGGCTCAC-1 11 18.95093 1182 ## TTTGTCAGTTCGGCAC-1 12 4.60339 10514
plotColData(sce, x="Sample", y="sum") + scale_y_log10() plotColData(sce, x="Sample", y="detected") + scale_y_log10() plotColData(sce, x="Sample", y="subsets_Mito_percent")
sce$low_lib_size <- isOutlier(sce$sum, log=TRUE, type="lower") sce$low_n_features <- isOutlier(sce$detected, log=TRUE, type="lower") sce$high_Mito_percent <- isOutlier(sce$subsets_Mito_percent, type="higher")
cell_qc_results <- quickPerCellQC(colData(sce), percent_subsets=c("subsets_Mito_percent"))
cell_qc_results <- quickPerCellQC(colData(sce), percent_subsets=c("subsets_Mito_percent"))
sce.Filtered <- sce[, !cell_qc_results$discard] sce.Filtered